Control Organoid samples from YFP

samples

AD4,AD8,AD12,AD16 are control samples

/home/pjb40/jupytervenv/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
scanpy==1.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.16.5 scipy==1.4.1 pandas==1.0.1 scikit-learn==0.23.1 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1
'/n/scratch3/groups/hsph/hbc/pjb40/scratch/TimeSeries_10X/data/velocyto_analysis/Only_controlN_Tumor/control_redo2'

Load filtered matrix

  • with adata and bdata from velocity, also filtered
check the matrix for QC

Normalize and log transform

Perform a clustering for normalization in clusters
make the copy of data ****IMP****
normalizing counts per cell
    finished (0:00:00)
extracting highly variable genes
    finished (0:00:02)
--> added
    'highly_variable', boolean vector (adata.var)
    'means', float vector (adata.var)
    'dispersions', float vector (adata.var)
    'dispersions_norm', float vector (adata.var)
save the raw data ** IMP ** 
do the actual filtering
regressing the effect of MT and total count and scaling for this dataset 

PCA

computing PCA
    on highly variable genes
    with n_comps=50
    finished (0:00:12)
no. of cells in each cluster
DAY-14    5266
DAY-4     1664
DAY-7     1286
DAY-10     983
Name: DAY, dtype: int64

Compute neighborhood graph

computing neighbors
    using 'X_pca' with n_pcs = 30
    finished: added to `.uns['neighbors']`
    `.obsp['distances']`, distances for each pair of neighbors
    `.obsp['connectivities']`, weighted adjacency matrix (0:00:08)

now compute the umap and then louvain

Embedding neighborhood graph

computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:26)
computing Diffusion Maps using n_comps=15(=n_dcs)
computing transitions
    finished (0:00:00)
    eigenvalues of transition matrix
    [1.         0.99555653 0.9928839  0.99237084 0.99143106 0.9868962
     0.98214227 0.9810217  0.9789502  0.97598475 0.9730472  0.9705779
     0.96859324 0.96752876 0.9666512 ]
    finished: added
    'X_diffmap', diffmap coordinates (adata.obsm)
    'diffmap_evals', eigenvalues of transition matrix (adata.uns) (0:00:00)
drawing single-cell graph using layout 'fa'
    finished: added
    'X_draw_graph_fa', graph_drawing coordinates (adata.obsm) (0:01:19)

Clustering the neighborhood graph

l=['louvain_r0.01','louvain_r0.025','louvain_r0.05','louvain_r0.1','louvain_r0.2','louvain_r0.3','louvain_r0.4' 'louvain_r0.5']

for i in l: res=i.split("_")[1] sc.tl.louvain(adata_pp, resolution=res, key_added=i) print (i,res)

Run the louvain cluters with different resoultions
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 1 clusters and added
    'louvain_r0.01', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 1 clusters and added
    'louvain_r0.025', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 2 clusters and added
    'louvain_r0.05', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 4 clusters and added
    'louvain_r0.1', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 6 clusters and added
    'louvain_r0.2', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 9 clusters and added
    'louvain_r0.3', the cluster labels (adata.obs, categorical) (0:00:00)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 10 clusters and added
    'louvain_r0.4', the cluster labels (adata.obs, categorical) (0:00:01)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished: found 11 clusters and added
    'louvain_r0.5', the cluster labels (adata.obs, categorical) (0:00:00)
computing UMAP
    finished: added
    'X_umap', UMAP coordinates (adata.obsm) (0:00:25)

check the qc after clustering

plotting the PCA, UMP, diffmap and drawmap
['louvain_r0.01',
 'louvain_r0.025',
 'louvain_r0.05',
 'louvain_r0.1',
 'louvain_r0.2',
 'louvain_r0.3',
 'louvain_r0.4louvain_r0.5']
Plot all the resolutions
how many cells each cluster have
louvain_r0.2
0 3399
1 2047
2 1824
3 886
4 816
5 227

Finding marker genes

ranking genes
    finished: added to `.uns['rank_genes_r0.2']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:03)
Plot marker genes
top5 genes in each cluster
0 1 2 3 4 5
0 Chil1 Clu Txnrd1 Hopx Trf Cp
1 Lamp3 Ffar4 Hsp90aa1 Ager Hp Ifitm3
2 Lyz2 Krt7 Ran Emp2 Epas1 Cbr2
3 Scd1 Ly6a Gclc Clic5 Slc34a2 Gstm2
4 Elovl1 S100a6 Tubb5 Sparc Scd1 Pdcd4
the list to check the genes
0_n 0_p 1_n 1_p 2_n 2_p 3_n 3_p 4_n 4_p 5_n 5_p
0 Chil1 0.0 Clu 0.0 Txnrd1 9.556727e-269 Hopx 0.0 Trf 1.188312e-159 Cp 4.855302e-155
1 Lamp3 0.0 Ffar4 0.0 Hsp90aa1 1.005675e-240 Ager 0.0 Hp 4.120648e-129 Ifitm3 3.406921e-134
2 Lyz2 0.0 Krt7 0.0 Ran 4.452678e-238 Emp2 0.0 Epas1 6.338847e-134 Cbr2 3.160241e-95
3 Scd1 0.0 Ly6a 0.0 Gclc 1.088050e-231 Clic5 0.0 Slc34a2 1.362970e-114 Gstm2 6.524714e-89
4 Elovl1 0.0 S100a6 0.0 Tubb5 2.053580e-219 Sparc 0.0 Scd1 3.658975e-112 Pdcd4 5.721815e-69
add the cell annotation and intersect the genes 
0 1 2 3 4 5
Stem 0.0 0.0 0.0 0.0 0.0 0.0
AT2 9.0 7.0 3.0 8.0 9.0 4.0

ploting AT2 genes on UMAP

plot the AT2 genes 
WARNING: The title list is shorter than the number of panels. Using 'color' value instead for some plots.
[<matplotlib.axes._subplots.AxesSubplot at 0x2b530f4b6990>,
 <matplotlib.axes._subplots.AxesSubplot at 0x2b530f629750>]

Compare to single cluster : Cluster 1

ranking genes
    finished: added to `.uns['rank_genes_groups']`
    'names', sorted np.recarray to be indexed by group ids
    'scores', sorted np.recarray to be indexed by group ids
    'logfoldchanges', sorted np.recarray to be indexed by group ids
    'pvals', sorted np.recarray to be indexed by group ids
    'pvals_adj', sorted np.recarray to be indexed by group ids (0:00:17)
Other than UMAP, diffmap visulisation also confirms the results
plot how many cells are in each cluster
<matplotlib.legend.Legend at 0x2b530ed55ad0>
start looking at Lyz2 expression is coming from which sample
check for Hmga2

Plots for AT2 markers

Club cell markers

plots for club cell markers 

save the results

Dotplots

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-3a3d5c2d7b7e> in <module>
----> 1 sc.pl.umap(adata_pp, color=['louvain_r0.2','DAY'], frameon=True, use_raw=False )

NameError: name 'sc' is not defined
<bound method AnnData.var_keys of AnnData object with n_obs × n_vars = 9199 × 1529 
    obs: 'DAY', 'batch', 'sample', 'n_counts', 'log_counts', 'n_genes', 'percent_mito', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'Clusters', '_X', '_Y', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'sample_batch', 'louvain_r0.01', 'louvain_r0.025', 'louvain_r0.05', 'louvain_r0.1', 'louvain_r0.2', 'louvain_r0.3', 'louvain_r0.4', 'louvain_r0.5'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'DAY_colors', 'diffmap_evals', 'draw_graph', 'louvain', 'louvain_r0.01_colors', 'louvain_r0.025_colors', 'louvain_r0.05_colors', 'louvain_r0.1_colors', 'louvain_r0.2_colors', 'louvain_r0.3_colors', 'louvain_r0.4_colors', 'louvain_r0.5_colors', 'neighbors', 'pca', 'rank_genes_groups', 'rank_genes_r0.2', 'sample_colors', 'umap'
    obsm: 'X_diffmap', 'X_draw_graph_fa', 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'ambiguous', 'counts', 'matrix', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'>
0 1 2 3 4 5
Stem 0.0 0.0 0.0 0.0 0.0 0.0

STOP HERE AND DO NOT RUN BELOW. REFERE TO VELOCITY NOTEBOOK.